Goto

Collaborating Authors

 balance exploration


DRTA: Dynamic Reward Scaling for Reinforcement Learning in Time Series Anomaly Detection

arXiv.org Artificial Intelligence

Anomaly detection in time series data is important for applications in finance, healthcare, sensor networks, and industrial monitoring. Traditional methods usually struggle with limited labeled data, high false-positive rates, and difficulty generalizing to novel anomaly types. To overcome these challenges, we propose a reinforcement learning-based framework that integrates dynamic reward shaping, Variational Autoencoder (VAE), and active learning, called DRTA. Our method uses an adaptive reward mechanism that balances exploration and exploitation by dynamically scaling the effect of VAE-based reconstruction error and classification rewards. This approach enables the agent to detect anomalies effectively in low-label systems while maintaining high precision and recall. Our experimental results on the Yahoo A1 and Yahoo A2 benchmark datasets demonstrate that the proposed method consistently outperforms state-of-the-art unsupervised and semi-supervised approaches. These findings show that our framework is a scalable and efficient solution for real-world anomaly detection tasks.


Solving Multi-arm Bandits with Python - Analytics Vidhya

#artificialintelligence

On entering casinos, there would be multiple machines that could make us win more money and some machines that can make us bankrupt. How nice would it be if we knew the working of the machines so that we could leverage the maximum benefit out of it? The multi-arm bandit is one such machine from which we can get the maximum benefit. Instead of relying on random chance, we go for a systematic approach by simply pulling random levers. Let's try to understand what it is and the different strategies to solve it.


Exploring Unknown States with Action Balance

arXiv.org Artificial Intelligence

Exploration is a key problem in reinforcement learning. Recently bonus-based methods have achieved considerable successes in environments where exploration is difficult such as Montezuma's Revenge, which assign additional bonus (e.g., intrinsic reward) to guide the agent to rarely visited states. Since the bonus is calculated according to the novelty of the next state after performing an action, we call such methods the next-state bonus methods. However, the next-state bonus methods bring extra issues. It may lead agent to be trapped in states that fewer being visited and ignore to explore unknown states. Moreover, the behavior policy of the agent is also influenced by the bonus added to the state (or state-action) values indirectly. In contrast to the bonus-based methods which explore in known states, in this paper, we focus on the other part of exploration: exploration for finding unknown states. We propose the action balance exploration method to overcome the defects of the next-state bonus methods, which balances the chosen time of each action in each state and can be treated as an extension of upper confidence bound (UCB) to deep reinforcement learning. To take both the advantages of the next-state bonus method and our action balance exploration method, we propose the action balance RND method, which takes both parts of exploration into consideration. The experiments on grid world and Atari games demonstrate action balance exploration has a better capability in finding unknown states and can improve the real performance of RND in some hard exploration environments respectively.